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1  Productivity  Measures 

Refereed  papers  submitted  but  not  yet  published:  1 
Refereed  papers  published:  0 
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Unrefereed  reports  and  articles:  6  (Most  of  these  were  conference  papers  for  which  2- 
to  6-page  extended  abstracts  were  refereed.) 


Books  or  parts  thereof  submitted  but  not  yet  published:  0 


Books  or  parts  thereof  published:  0 


Patents  filed  but  not  yet  granted:  0 


Patents  granted:  0 


Invited  presentations:  1 


Contributed  presentations:  4 

Honors  received:  1  (Douglas  Appelt  was  elected  vice-president  of  the  Association  for 
Computational  Linguistics  for  1994.) 

Prizes  or  awards  received:  0 


Promotions  obtained:  1  (Patti  Price  was  promoted  to  Director  of  SRI’s  Speech  Re¬ 
search  Program.) 

Graduate  students  supported  >  25%  of  full  time:  3 
Post-docs  supported  >  25%  of  full  time:  1 
Minorities  supported:  0 
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2  Summary  of  Technical  Progress 

Under  this  effort,  SRI  has  developed  spoken-language  technology  for  interactive  prob¬ 
lem  solving,  featuring  real-time  performance  for  up  to  several  thousand  word  vocab¬ 
ularies,  high  semantic  accuracy,  habitability  within  the  domain,  and  robustness  to 
many  sources  of  variability.  Although  the  technology  is  suitable  for  many  applica¬ 
tions,  efforts  to  date  have  focussed  on  developing  an  Air  Travel  Information  System 
(AXIS)  prototype  application.  SRI’s  AXIS  system  has  been  evaluated  in  four  ARPA 
benchmark  evaluations,  and  has  consistently  been  at  or  near  the  top  in  performance. 
These  achievements  are  the  result  of  SRI’s  technical  progress  in  speech  recognition, 
natural-language  processing,  and  speech  and  natural- language  integration. 

SRI’s  DECIPHER  is  a  continuous-speech,  speaker-independent  speech  recognizer 
based  on  tied-mixture,  gender-dependent  hidden  Markov  model  (HMM)  technology. 
It  uses  six  cepstra-  and  energy-based  features  generated  from  a  filterbank  computed 
via  fast  Fourier  transforms  and  high-pass  filtering  in  the  log-spectral-energy  domain. 
Noise-robust  modeling  and  spectral  normalization  algorithms  are  used  to  improve 
robustness  to  channel  variation.  Pronunciation  variability  is  modeled  through  proba¬ 
bilistically  pruned  linguistic  rules.  Cross-word  acoustic  and  phonological  models  are 
used  to  model  coarticulation.  Recognizers  trained  separately  on  male  auad  female 
speech  are  run  in  parallel,  and  backed-off  bigram  and  trigram  language  models  are 
used  to  reduce  perplexity.  Current  research  focuses  or,  improving  the  consistency  of 
recognition  hypotheses  by  reducing  frame-to-frame  independence  assumptions,  adapt¬ 
ing  acoustic  models  to  speakers,  and  improving  the  estimation  and  use  of  statistical 
language  models. 

SRI’s  research  on  natural- language  understanding  for  spoken-language  systems 
has  proceeded  along  two  lines.  The  shorter-term  line  of  research  has  focused  on  the 
Template  Matcher,  a  module  that  constructs  database  queries  by  searching  the  user 
input  for  key  words  and  phrases  characteristic  of  the  most  common  query  types  for  a 
given  task,  ignoring  parts  of  the  input  that  it  does  not  understand.  This  approach  is 
robust  to  unemticipated  variation  in  phrasing,  but  is  limited  in  its  ability  to  extract 
information  that  depends  ou  structural  relationships  among  phrases.  A  longer-term 
effort  is  focused  on  more  sophisticated  syntactic  and  semantic  analysis  of  the  input, 
using  a  unification-grammar-based  natural-lemguage  processing  system  called  GEM- 
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INI.  GEMINI  is  capable  of  analyzing  more  complex  semantic  relationships  than  the 
Template  Matcher,  but  is  more  fragile,  by  itself,  to  unanticipated  variation  in  query 
phrasing.  To  get  the  benefits  of  both  approaches,  the  ATIS  system  we  currently  use 
in  benchmark  evaduations  incorporates  both  GEMINI  and  the  Template  Matcher,  by 
first  attempting  to  construct  a  complete  analysis  of  a  query  using  GEMINI,  and  falling 
back  on  the  Template  Matcher  if  that  fails.  That  way,  GEMINI  gets  a  chance  to  give 
an  exact  analysis  of  the  input  before  the  Template  Matcher  attempts  an  approximate 
one. 

The  integration  of  speech  and  language  processing  in  SRI’s  current  ATIS  system 
is  a  serial  connection,  using  the  robustness  of  the  Template  Matcher  to  accommodate 
recognition  errors.  This  resulted  in  a  system  where  only  21.6%  of  utterances  in  the 
November  1992  ATIS  SLS  benchmark  test  failed  to  receive  a  correct  answer,  even 
though  there  was  at  least  one  recognition  error  on  33.8%  of  the  utterances.  Longer- 
term  research  is  exploring  the  use  of  the  GEMINI  system  to  guide  the  recognizer  to 
favor  more  semantically  meaningful  recognition  hypotheses.  Our  current  research  in 
this  area  focuses  on  maintaining  robustness  by  making  use  of  information  provided  by 
the  natural-language  system  even  when  it  fails  to  obtaun  a  complete  semantic  analysis. 

Another  area  of  research  in  speech  and  natural-language  integration  in  which  we 
have  achieved  significant  results  is  the  detection  «ind  correction  of  verbal  repairs — 
cases  of  self-editing  where  the  speadeer  intends  part  of  what  wets  said  to  be  overridden 
by  subsequent  speech;  for  example,  Show  me  flights  {that  lea-}  that  arrive  before  seven 
pm.  In  this  effort,  we  have  developed  a  notation  for  describing  amd  annotating  repairs, 
analyzed  the  repairs  occurring  in  a  10,000-utterance  tredning  set  of  /iTlS  data,  and 
developed  preliminary  methods  to  recognize  and  correct  repairs  by  combining  string 
matching  with  acoustic  eind  natural-lsinguage  information  sources.  In  addition,  we 
have  incorporated  a  component  based  on  those  methods  into  the  GEMINI  system. 

During  current  reporting  period  of  the  project  we  have: 

•  Participated  in  ARPA-sponsored  ATIS  benchmarks:  SRI  achieved  a  9.1%  word- 
error  rate  for  speech  recognition,  a  23.6%  weighted  utterance  error  for  natural 
language  understanding,  and  33.2%  weighted  utterance  error  for  speech  under¬ 
standing. 

•  Prepared  and  delivered  to  ARPA  demonstrations  of  spoken-language  technology 
(voice  banking,  ATIS,  and  dictation)  along  with  associated  online  background 
material. 

•  Carried  out  experiments  suggesting  a  potential  30-40%  reduction  in  word-error 
rate  by  incorporating  natural-language  constraints  from  GEMINI  into  DECI¬ 
PHER. 

•  Incre  sed  the  speed  of  the  GEMINI  parser  by  a  factor  of  four  by  improved 
handling  of  semantic  selectional  restrictions. 
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•  Developed  a  method  for  constructing  telephone  acoustic  models  from  a  high 
quality  speech  corpus. 

•  Developed  a  new  method  for  speaker  adaptation  based  on  Bayesian  reestimation 
of  tied-gaussian  mixture  weights. 

•  Developed  trigram-based  language  models  for  the  AXIS  teisk. 

•  Revised  the  method  for  generating  progressive  search  lattices  to  facilitate  the 
use  of  natural-language  knowledge  sources  to  constrain  speech  recognition. 

•  Developed  techniques  for  using  the  Template  Matcher  to  constrain  speech  recog¬ 
nition. 

•  Collected  spontaneous  speech  data  in  the  AXIS  domain  from  47  speakers  for 
190  scenarios. 
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4  Transitions  and  DoD  Interactions 

SRI  delivered  to  ARPA  demonstration  versions  of  three  prototype  systems  (voice 
banking,  ATIS,  and  dictation)  for  promotion  and  for  the  education  of  potential  appli¬ 
cations  developers,  users,  and  funders.  SRI’s  ATIS  system  has  been  delivered  to  NIST 
for  collection  of  training  and  test  data  for  benchmark  evaluations,  amd  we  have  trained 
NIST  personnel  in  data  collection.  Finally,  SRI  participated  in  the  ARPA-sponsored 
Spoken  Lzinguage  Technology  Applications  Day  in  Washington,  DC,  in  April  1993, 
demonstrating  applications  of  our  spoken-language  technology. 
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5  Software  and  Hardware  Prototypes 

We  have  developed  ^^l  Air  Travel  Information  System  (AXIS)  as  a  prototype  ap¬ 
plication  of  spoken- language  understanding.  This  system  enables  a  user  to  retrieve 
airline  schedules,  fares,  and  related  information  by  means  of  spoken  natural-language 
queries.  We  have  evaluated  this  system  in  four  AXIS  benchmark  evaluations,  and  we 
have  also  used  it  for  data  collection. 

We  have  also  developed  GEMINI,  a  parsing  and  semantic  interpretation  system 
based  on  unification  gramm2U'.  GEMINI  first  applies  syntactic,  semcintic,  and  lexical 
rules  bottom-up,  using  an  all-paths  “constituent”  pcirser  to  populate  a  chart  with 
edges  containing  syntactic  and  semantic  information.  Then,  a  second  “utteriince” 
ptirser  is  used  to  apply  a  second  set  of  syntactic  and  semantic  rules  that  are  required 
to  span  the  entire  utterance.  If  no  semantically  acceptable  utterance-spanning  edges 
are  found  during  this  phase,  a  component  to  recognize  and  correct  verba]  ’•epairs  is 
applied.  When  an  acceptable  interpretation  is  found,  a  set  of  parse  preferences  is 
used  to  choose  a  single  best  interpretation  from  the  chart  to  be  used  for  subsequent 
processing.  Quantifier  scoping  rules  are  applied  to  this  best  interpretation  to  produce 
a  scoped  logical  form. 

In  our  efforts  to  commercicilize  the  spoken-language  systems  technology  developed 
in  part  under  this  effort,  we  have  developed  a  Trader’s  Workstation  for  a  large  Wall 
Street  firm,  which  they  aje  currently  evaluating. 
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